Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories
نویسندگان
چکیده
How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step towards answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e. representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems.
منابع مشابه
On the Nature of Data-driven Primitive Representations of Speech Articulation
A long standing view in speech production research posits that articulatory representations are low dimensional. Conceptual and computational models have been built based on this view. In this work we explore the nature of low dimensional representations derived directly from articulatory signals based on sparsity constraints. Specifically, we present a method to examine how well derived repres...
متن کاملContinuous speech recognition using articulatory data
In this paper we show that there is measurable information in the articulatory system which can help to disambiguate the acoustic signal. We measure directly the movement of the lips, tongue, jaw, velum and larynx and parameterise this articulatory feature space using principal components analysis. The parameterisation is developed and evaluated using a speaker dependent phone recognition task ...
متن کاملPseudo-articulatory representations: promise, progress and problems
Pseudo-Articulatory Representations (PARs) have been proposed and discussed in relation to speech processing with results reported for both synthesis and recognition. PARs are derived from linguistic specifications of articulatory activity which are both abstract and idealized. The abstractions and idealizations permit the linguistic generality to be distinguished from the articulatory reality;...
متن کاملHuman Feature Extraction the Role of the Articulatory Rhythm
Neuro-physical investigations [1] hint to a new paradigm for feature extraction not used in ASR. This paradigm is based on synchronized brain to brain oscillations, active during speech production and speech perception. This mechanism leads to an evolving theory, the author calls the Unified Theory of Human Speech Processing (UTHSP). The core elements of this theory are the articulatory rhythm ...
متن کاملEvaluation of Juncture Strength using Articulatory Synthesis of Prosodic Gestures and Functional Data Analysis
Prosodic boundary gestures (pi-gestures) (Byrd & Saltzman, J. Phon., 2003) have been introduced to model the local slowing or lengthening of articulatory gestures in the vicinity of phrase boundaries. In this paper, pi-gestures are simulated within the TaDA task dynamics computational model and examined using functional data analysis (FDA) to evaluate articulatory lengthening in terms of underl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer speech & language
دوره 36 شماره
صفحات -
تاریخ انتشار 2016